18 research outputs found

    eHive: An Artificial Intelligence workflow system for genomic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.</p> <p>Results</p> <p>We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.</p> <p>Conclusions</p> <p>eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: <url>http://www.ensembl.org/info/docs/eHive/</url>.</p

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    4to. Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad. Memoria académica

    Get PDF
    Este volumen acoge la memoria académica de la Cuarta edición del Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad, CITIS 2017, desarrollado entre el 29 de noviembre y el 1 de diciembre de 2017 y organizado por la Universidad Politécnica Salesiana (UPS) en su sede de Guayaquil. El Congreso ofreció un espacio para la presentación, difusión e intercambio de importantes investigaciones nacionales e internacionales ante la comunidad universitaria que se dio cita en el encuentro. El uso de herramientas tecnológicas para la gestión de los trabajos de investigación como la plataforma Open Conference Systems y la web de presentación del Congreso http://citis.blog.ups.edu.ec/, hicieron de CITIS 2017 un verdadero referente entre los congresos que se desarrollaron en el país. La preocupación de nuestra Universidad, de presentar espacios que ayuden a generar nuevos y mejores cambios en la dimensión humana y social de nuestro entorno, hace que se persiga en cada edición del evento la presentación de trabajos con calidad creciente en cuanto a su producción científica. Quienes estuvimos al frente de la organización, dejamos plasmado en estas memorias académicas el intenso y prolífico trabajo de los días de realización del Congreso Internacional de Ciencia, Tecnología e Innovación para la Sociedad al alcance de todos y todas

    Numerous Novel Annotations of the Human Genome Sequence Supported by a 5′-End–Enriched cDNA Collection

    No full text
    A collection of 90,000 human cDNA clones generated to increase the fraction of “full-length” cDNAs available was analyzed by sequence alignment on the human genome assembly. Five hundred fifty-two gene models not found in LocusLink, with coding regions of at least 300 bp, were defined by using this collection. Exon composition proposed for novel genes showed an average of 4.7 exons per gene. In 20% of the cases, at least half of the exons predicted for new genes coincided with evolutionary conserved regions defined by sequence comparisons with the pufferfish Tetraodon nigroviridis. Among this subset, CpG islands were observed at the 5′ end of 75%. In-frame stop codons upstream of the initiator ATG were present in 49% of the new genes, and 16% contained a coding region comprising at least 50% of the cDNA sequence. This cDNA resource also provided candidate small protein-coding genes, usually not included in genome annotations. In addition, analysis of a sample from this cDNA collection indicates that ∼380 gene models described in LocusLink could be extended at their 5′ end by at least one new exon. Finally, this cDNA resource provided an experimental support for annotations based exclusively on predictions, thus representing a resource substantially improving the human genome annotation
    corecore